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O |. Advances in Geographical Information Systems (GIS) have led 

.^^ ' to the enormous recent burgeoning of spatial-temporal databases and 

associated statistical modeling. Here we depart from the rather rich 
literature in space-time modeling by considering the setting where 
space is discrete (e.g., aggregated data over regions), but time is con- 
tinuous. Our major objective in this application is to carry out infer- 
CLj ' ence on gradients of a temporal process in our data set of monthly 

.^^ , county level asthma hospitalization rates in the state of California, 

while at the same time accounting for spatial similarities of the tem- 
j^ ' poral process across neighboring counties. Use of continuous time 

models here allows inference at a finer resolution than at which the 
data are sampled. Rather than use parametric forms to model time, 
we opt for a more flexible stochastic process embedded within a dy- 
namic Markov random field framework. Through the matrix-valued 
covariance function we can ensure that the temporal process realiza- 
tions are mean square differentiable, and may thus carry out infer- 
^iL ' ence on temporal gradients in a posterior predictive fashion. We use 

^^ I this approach to evaluate temporal gradients where we are concerned 

with temporal changes in the residual and fitted rate curves after 
accounting for seasonality, spatiotemporal ozone levels and several 
spatially-resolved important sociodemographic covariates. 
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1. Introduction. Technological advances in spatially-enabled sensor net- 

^ ' works and geospatial information storage, analysis and distribution systems 

H ■ have led to a burgeoning of spatial-temporal databases. Accounting for asso- 

- - -' ciations across space and time constitutes a routine component in analyzing 

geographically and temporally referenced data sets. The inference garnered 

through these analyses often supports decisions with important scientific 

implications, and it is therefore critical to accurately assess inferential un- 
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certainty. The obstacle for researchers is increasingly not access to the right 
data, but rather implementing appropriate statistical methods and software. 

There is a considerable literature in spatio-temporal modeling; see, for 
example, the recent book by Cressie and Wikle (2011) and the references 
therein. Space-time modeling can broadly be classified as considering one 
of the following four settings: (a) space is viewed as continuous, but time 
is taken to be discrete, (b) space and time are both continuous, (c) space 
and time are both discrete, and (d) space is viewed as discrete, but time 
is taken to be continuous. Almost exclusively, the existing literature con- 
siders the first three settings. Perhaps the most pervasive case is the first. 
Here, the data are regarded as a time series of spatial process realizations. 
Early approaches include the STARMA [Pfeifer and Deutsch (1980a, 1980b)] 
and STARMAX [Stoffer (1986)] models, which add spatial covariance struc- 
ture to standard time series models. Handcock and Wallis (1994) employ 
stationary Gaussian process models with an AR(1) model for the time se- 
ries at each location to study global warming. Building upon previous work 
in the setting of dynamic models by West and Harrison (1997), several au- 
thors, including Stroud, Miiller and Sanso (2001) and Gelfand, Banerjee and 
Gamerman (2005), proposed dynamic frameworks to model residual spatial 
and temporal dependence. 

When space and time are both viewed as continuous, the preferred ap- 
proach is to construct stochastic processes using space-time covariance func- 
tions. Gneiting (2002) built upon earlier work by Cressie and Huang (1999) 
to propose general classes of nonseparable, stationary covariance functions 
that allow for space-time interaction terms for spatiotemporal random pro- 
cesses. Stein (2005) considered a variety of properties of space-time co- 
variance functions and how these were related to process spatial-temporal 
interactions. 

Finally, in settings where both space and time are discrete there has been 
much spatiotemporal modeling based on a Markov random field (MRF) 
structure in the form of conditionally autoregressive (CAR) specifications. 
See, for example. Waller et al. (1997), who developed such models in the 
service of disease mapping, and Gelfand et al. (1998), whose interest was 
in single family home sales. Pace et al. (2000) work with simultaneous au- 
toregressive (SAR) models extending them to allow temporal neighbors as 
well as spatial neighbors. Later examples include the space-time interaction 
CAR model proposed by Schmid and Held (2004), the dynamic CAR model 
proposed by Martinez-Beneito, Lopez-Quilez and Botella-Rocamora (2008), 
the proper Gaussian MRF process models of Vivar and Ferreira (2009) and 
the latent structure models approach from Lawson et al. (2010). 

Our manuscript departs from this rich literature by considering the set- 
ting where space is discrete and time is continuous. This can be envisioned 
when, for instance, we have a collection of Ng functions of time over Ng 
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regions, but the functions are posited to be spatially associated. That is, 
functions arising from neighboring regions are believed to resemble each 
other. The functional data analysis literature [Ramsay and Silverman (1997) 
and references therein] deals almost exclusively with kernel smoothers and 
roughness-penalty type (spline) models; recent discrete-space, continuous 
time examples using spline-based methods include the works by MacNab and 
Gustafson (2007) and Ugarte, Goicoa and Militino (2010). Baladandayutha- 
pani et al. (2008) consider spatially correlated functional data modeling for 
point-referenced data by treating space as continuous. A recent review by 
Delicado et al. (2010) reveals that spatially associated functional model- 
ing of time has received little attention, especially for regionally aggregated 
data. This is unfortunate, especially given the data set we encounter here 
(see Section 2 below). 

As such, we propose a rich class of Bayesian space-time models based 
upon a dynamic MRF that evolves continuously over time. This accom- 
modates spatial processes that are posited to be spatially indexed over a 
geographical map with a well-defined system of neighbors. This continuous 
temporal evolution sets our current article apart from the existing litera- 
ture. Rather than modeling time using simple parametric forms, as is often 
done in longitudinal contexts, we employ a stochastic process, enhancing the 
model's adaptability to the data. 

The benefits of using a continuous-time model over a discrete-time model 
here are twofold. First and foremost, investigators (or, in our setting, public 
health officials) may desire understanding of the local effects of temporal 
impact at a resolution finer than that at which the data were sampled. For 
instance, despite collecting data monthly, there may be interest in making 
inference on a particular week or even a given day of that month. While 
there is a wealth of literature in this domain, dynamic space-time models 
that treat time discretely can offer statistically legitimate inference only at 
the level of the data. Second, the modeling also allows us to subsequently 
carry out inference on temporal gradients, that is, the rate of change of the 
underlying process over time. We show how such inference can be carried out 
in fully model-based fashion using exact posterior predictive distributions 
for the gradients at any arbitrary time point. 

The smoothness implications for the underlying process in this context 
are obvious. We deploy a mean square differentiable Gaussian process that 
provides a tractable gradient (or derivative) process to help us achieve these 
inferential goals. Here our goal is to detect temporal changes in the residu- 
als that remain after accounting for important covariates; significant changes 
may correspond to changes in spatiotemporal covariates still missing from 
our model. While the residuals themselves could be beneficial in detecting 
missing covariates, temporal gradients can be more useful in detecting co- 
variates that operate on much finer scales. For example, time points with 
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significantly high residual gradients are likely to point toward missing covari- 
ates whose rapid changes on a finer scale impact the outcome. On the other 
hand, the residual process estimated from discrete time models is likely to 
smooth over any patterns arising from such local behavior of covariates. 

The remainder of the manuscript is structured as follows. Section 2 de- 
scribes the data set that motivates our methodology and which we analyze 
in depth. Section 3 outlines a class of dynamic MRF indexed continuously 
over time. Section 4 provides details on the Bayesian hierarchical models 
that emerge from our rich space-time structures, while Section 5 derives the 
posterior predictive inferential procedure for the temporal gradient process, 
verified via simulation in Section 6. Section 7 describes the detailed analysis 
of our data set, while Section 8 summarizes and concludes. 

2. Data. Our data set consists of asthma hospitalization rates in the 
state of California. According to the California Department of Health Ser- 
vices (2003), millions of residents of California suffer from asthma or asthma- 
like symptoms. As many studies have indicated [e.g., English et al. (1998)], 
asthma rates are related to, among other things, pollution levels and socioe- 
conomic status (SES) — two variables that likely induce a spatiotemporal 
distribution on such rates. Weather and climate also likely play a role, as 
cold air can trigger asthma symptoms. 

The data we will analyze were collected daily from 1991 to 2008 from 
each of the 58 counties. We consider all hospital discharges where asthma 
was the primary diagnosis, which are categorized as extrinsic (allergic), in- 
trinsic (nonallergic) or other. Due to confidentiality, data for days with be- 
tween one and four hospitalizations of a specific category are missing; this 
affected 38% of our observations, including more than 50% of those from 21 
counties. To remedy this, county-specific values for these days are imputed 
using a method similar to Besag's iterated conditional modes method [Besag 
(1986)]; see the online supplement [Quick, Banerjee and Carlin (2013)] for 
details. For our analysis, the data are aggregated by month, for a total of 216 
observations per county over the 18-year period, and then rates per 100,000 
residents are computed; the conversion from counts to rates for the purpose 
of fitting Gaussian spatiotemporal models is common in the literature [see, 
e.g.. Short, Carlin and Bushhouse (2002)]. While the vast majority of rates 
are less than 20 hospitalizations per month per 100,000 people, the range of 
the rates extends from to 90. As can be seen in Figure 1, hospitalization 
for asthma demonstrates a statewide decreasing trend early in the study 
period and appears to stabilize in later years. Here, we map the raw annual 
(summed over month) hospitalization rates, which have values between 
and 340 hospitalizations per 100,000. 

We attempt to capture the effect of socioeconomic status by including 
population density in our model, using data from the 2000 U.S. Census 
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Fig. 1. Raw annual (summed over month) asthma hospitalization rates per 100,000. 
Note: the analysis performed here was conducted on the monthly level; annual aggregation 
for illustration purposes only. 



and land area measurements from the National Association of Counties. To 
account for pollution, we use data from the Air Resources Board of the Cal- 
ifornia Environmental Protection Agency which counts the number of days 
in each month with average ozone levels above 0.07 ppm over 8 consecutive 
hours, the state standard. Because our ozone data is compiled at the air 
basin level, county-specific values are calculated by taking the maximum 
value of all air basins that the county belonged to. Generally, ozone levels 
are highest during the summer months, with the highest values in southern 
California and the Central Valley region, and show little variation between 
years. As hospitalization rates are higher among youth and the black pop- 
ulation, county-level covariates for percent under 18 and percent black are 
also included. These demographic covariates both have their highest values 
in southern California, though counties in the Central Valley region also 
have larger black populations. 

3. Areally referenced temporal processes. As mentioned above, our 
methodological contribution is a modeling framework for areally referenced 
outcomes that, it can be reasonably assumed, arise from an underlying 
stochastic process continuous over time. To be specific, consider a map of 
a geographical region comprising Ng regions that are delineated by well- 
defined boundaries, and let Yi(t) be the outcome arising from region i at 
time t. For every region i, we believe that Yi{t) exists, at least conceptually, 
at every time point. However, the observations are collected not continuously 
but at discrete time points, say, T = {ti,t2, ■ ■ ■ ,tNt}- For the time being, we 
will assume that the data comes from the same set of time points in T 
for each region. This is not necessary for the ensuing development, but will 
facilitate the notation. 
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A spatial random effect model for our data assumes 

Y^{t) = fiiit) + Zi{t) + e,(t), e,(t) '^ N{0,tI) 

(1) 

fori = l,2,...,iV„ 

where /ii(t) captures large scale variation or trends, for example, using a 
regression model, and Zi{t) is an underlying areally-referenced stochastic 
process over time that captures smaller-scale variations in the time scale 
while also accommodating spatial associations. Each region also has its own 
variance component, t^, which captures residual variation not captured by 
the other components. 

The process Zi{t) specifies the probability distribution of correlated space- 
time random effects while treating space as discrete and time as continuous. 
We seek a specification that will allow temporal processes from neighboring 
regions to be more alike than from nonneighbors. As regards spatial associ- 
ations, we will respect the discreteness inherent in the aggregated outcome. 
Rather than model an underlying response surface continuously over the 
region of interest, we want to treat the Zj(t)'s as functions of time that are 
smoothed across neighbors. 

The neighborhood structure arises from a discrete topology comprising a 
list of neighbors for each region. This is described using an Ns x Ng adja- 
cency matrix W = {wij}, where Wij = if regions i and j are not neighbors 
and Wij = c 7^ when regions i and j are neighbors, denoted by i ^ j- By 
convention, the diagonal elements of W are all zero. To account for spatial 
association in the Zj(t)'s, a temporally evolving MRF for the areal units at 
any arbitrary time point t specifies the full conditional distribution for Zi{t) 
as depending only upon the neighbors of region i, 

(2) p^zM{ZjMt)})-N(Y,c^^z,{t),^), 

J ^ 

where Wi-^ = '^jr^iWij, a^ > 0, and a is a propriety parameter described be- 
low. This means that the A'^s x 1 vector Z(i) = (Zi(i), Z2(t), . . . , Z^^ {t)) fol- 
lows a multivariate normal distribution with zero mean and a precision ma- 
trix ^ (Z) — aW) , where D is a diagonal matrix with Wi+ as its ith diagonal 
elements. The precision matrix is invertible as long as a € (1/A(i), 1/A(„)), 
where An) (which can be shown to be negative) and A(„) (which can be 
shown to be 1) are the smallest (i.e., most negative) and largest eigenvalues 
of D^^''^WD~^''^, respectively, and this yields a proper distribution for Z(i) 
at each time point t. 

The MRF in (2) does not allow temporal dependence; the Z(i)'s are inde- 
pendently and identically distributed as A(0, a^{D — aW)~^). We could al- 
low time- varying parameters a^ and at so that Z(t) ~ A(0, (Jt{D — atW)^^) 
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for every t. If time were treated discretely, then we could envision dynamic 
autoregressive priors for these time- varying parameters, or some transfor- 
mations thereof. However, there are two reasons why we do not pursue this 
further. First, we do not consider time as discrete because that would pre- 
clude inference on temporal gradients, which, as we have mentioned, is a 
major objective here. Second, time-varying hyperparameters, especially the 
aj's, in MRF models are usually weakly identified by the data; they permit 
very little prior-to-posterior learning and often lead to over-parametrized 
models that impair predictive performance over time. 

Here we prefer to jointly build spatial-temporal associations into the 
model using a multivariate process specification for Z(i). A highly flexi- 
ble and computationally tractable option is to assume that Z(i) is a zero- 
centered multivariate Gaussian process, GP{0,Kz{-,-))^ where the matrix- 
valued covariance function [e.g., ^^cross-covariance matrix function," Cressie 
(1993)] Kzit,u) = cov{Z(t), Z(u)} is defined to be the N^ x A^^ matrix with 
(i,j)th entry cov{Zi{t), Zj{u)} for any {t,u) S K"*" x ^R'^. Thus, for any two 
positive real numbers t and u, Kz{t,u) is an Ng x Ng matrix with (i, j)th el- 
ement given by the covariance between Zi{t) and Zj{u). These multivariate 
processes are stationary when the covariances are functions of the separa- 
tion between the time points, in which case we write Kz{t,u) = Kz{A), and 
fully symmetric when Kz{t,u) = Kz{\^\), where A = t — u. For a detailed 
exposition on covariance functions, see Chapter 7 of Banerjee, Gelfand and 
Sirmans (2003); Gelfand and Banerjee (2010) and Gneiting and Guttorp 
(2010) also provide overviews for continuous settings. 

To ensure valid joint distributions for process realizations, we use a con- 
structive approach similar to that used in linear models of coregionalization 
(LMC) and, more generally, belonging to the class of multivariate latent pro- 
cess models [see Section 7.2 of Banerjee, Gelfand and Sirmans (2003)]. We 
assume that Z(i) arises as a (possibly temporally- varying) linear transfor- 
mation Z(t) = A(t)v(t) of a simpler process v(i) = {vi{t),V2{t), . . . ,UAr^(t))'^, 
where the Vi{t)'s are univariate temporal processes, independent of each 
other, and with unit variances. This differs from the conventional LMC ap- 
proach based on spatial processes, which treats space as continuous. The 
matrix- valued covariance function for v(i), say, K^{t,u), thus has a sim- 
ple diagonal form and Kz{t,u) = A{t)Kv{t,u)A{u)'^ . The dispersion matrix 
for Z is T,z = -ASv^^, where „4 is a block-diagonal matrix with A{tj)^s as 
blocks, and Sv is the dispersion matrix constructed from K^{t,u). Con- 
structing simple valid matrix-valued covariance functions for v(t) automat- 
ically ensures valid probability models for Z(i). Also note that for t = u, 
Kv{t,t) is the identity matrix so that Kz{t,t) = A{t)A{t)'^ and A{t) is a 
square-root (e.g., obtained from the triangular Cholesky factorization) of 
the matrix- valued covariance function at time t. 
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The above framework subsumes several simpler and more intuitive speci- 
fications. One particular specification that we pursue here assumes that each 
Vi{t) follows a stationary Gaussian Process GP(0,p(-, ■; (?!))), where p{-,-;(p) 
is a positive definite correlation function parametrized by [e.g., Stein 
(1999)], so that cov {vi{t),Vi{u)) = p{t,u;(p) for every i = 1,2, . . . , A^^ for all 
nonnegative real numbers t and u. Since the Vi{t) are independent across i, 
cov{vi (t) , Vj {u)} = for i^ j. 

The matrix-valued covariance function for Z(t) becomes Kz{t,u) = p{t,u; 
4>)A(t)A[u)'^ . If we further assume that A{t) = Ais constant over time, then 
the process Z(t) is stationary if and only if v(i) is stationary. Further, we 
obtain a separable specification, so that Kz{t,u) = p{t,u;(f))AA'^ . Letting 
A be some square-root (e.g., Cholesky) of the Ns x Ng dispersion matrix 
a^{D — aW)~^ and R{4>) be the Nt x Nt temporal correlation matrix having 
(i,j)th element p(ti,tj;(f)) yields 

Kzit,u)=a^p{t,u;ct)){D-aW)~^ and 

(3) 

S^ = R{cj>) ®a'^{D- aW)-^. 

It is straightforward to show that the marginal distribution from this con- 
structive approach for each Z(tj) is iV(0, a^{D — aW)~^), the same marginal 
distribution as the temporally independent MRF specification in (2). There- 
fore, our constructive approach ensures a valid space-time process, where 
associations in space are modeled discretely using a MRF, and those in time 
through a continuous Gaussian process. 

This separable specification is easily interpretable, as it factorizes the dis- 
persion into a spatial association component (areal) and a temporal compo- 
nent. Another significant practical advantage is its computational feasibility. 
Estimating more general space-time models usually entails matrix factoriza- 
tions with 0{NgNf) computational complexity. The separable specification 
allows us to reduce this complexity substantially by avoiding factorizations 
of NsNt X NgNf matrices. One could design algorithms to work with matri- 
ces whose dimension is the smaller of Ng and Nt, thereby accruing massive 
computational gains. 

More general models using this approach are introduced and discussed in 
the online supplement [Quick, Banerjee and Carlin (2013)], but since they 
do not offer anything new in terms of temporal gradients, we do not pursue 
them in the remainder of this paper. 

4. Hierarchical modeling. In this section we build a hierarchical model- 
ing framework to analyze the data in Section 2 using the likelihood from our 
spatial random effects model in (1) and the distributions emerging from the 
temporal Gaussian process discussed in Section 3. The mean pi{t) in (1) is 
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often indexed by a parameter vector (3, for example, a linear regression with 
regressors indexed by space and time so that fii{t;(3) = Xj(i)'^/3. 
The posterior distributions we seek can be expressed as 

p(6>,Z|Y)ocp(0)xIG(a2|a,,6,)x | J];iG(r2|G,,6,) J x N{p\fif3,J:^) 

X Beta.{a\aa,ba) 
(4) xN{Z\0,R{^)(^a'^{D-aW)-^) 

Nt Ns 



X 



^i )i 



where 6 = {0, a, cr^, /3, r^, t|, . . . , r^ } and Y is the vector of observed out- 
comes defined analogous to Z. The parametrizations for the standard den- 
sities are as in Carlin and Louis (2009). We assume all the other hyperpa- 
rameters in (4) are known. 

Recall the separable matrix- valued covariance function in (3). The corre- 
lation function p{-;(j)) determines process smoothness and we choose it to 
be a fully symmetric Matern correlation function given by 

(5) />(t,n;(/)) = p(A;0) = ^^^^|^^^_^ (2v^|A|<Ai)'^^/Cfe(2v^|A|0i), 

where (p = {(^1,(^2}, A = t — u, r(-) is the Gamma function, ICfj,^{-) is the 
modified Bessel function of the second kind, and <j)i and (p2 are nonnegative 
parameters representing rate of decay in temporal association and smooth- 
ness of the underlying process, respectively. 

We use Markov chain Monte Carlo (MCMC) to evaluate the joint pos- 
terior in (4), using Metropolis steps for updating and Gibbs steps for 
all other parameters, details of which are shown in the supplemental arti- 
cle [Quick, Banerjee and Carlin (2013)]. Sampling-based Bayesian inference 
seamlessly delivers inference on the residual spatial effects. Specifically, if 
to is an arbitrary unobserved time point, then, for any region i, we sample 
from the posterior predictive distribution p(Z,(to)|Y) = J p{Zi{tQ)\Zi,6)p{6, 
Z\Y)d6 dZ. This is achieved using composition sampling: for each sampled 
value of {0,Z}, we draw Zi{to), one for one, from p{Zi{to)\Z,6), which is 
Gaussian. Also, our sampler easily adapts to situations where Yi(t) is missing 
(or not monitored) for some of the time points in region i. We simply treat 
such variables as missing values and update them, from their associated full 
conditional distributions, which of course are N{-Ki{t)'^(3 + Zi{t),Tf). We as- 
sume that all predictors in Xj(t) will be available in the space-time data 
matrix, so this temporal interpolation step for missing outcomes is straight- 
forward and inexpensive. 
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Model checking is facilitated by simulating independent replicates for 
each observed outcome: for each region i and observed time point tj, we 
sample from p(yrep,i(ti)|Y) = /iV(y,ep,i(t,)|xi(tj)^/3 + Zi(tj),r2)p(/3, Zi(tj), 
Tf\Y)df3dZi{tj)dTf, where p{/3, Zi{tj),Tlf\Y) is the marginal posterior dis- 
tribution of the unknowns in the likelihood. Sampling from the posterior pre- 
dictive distribution is straightforward, again, using composition sampling. 

5. Gradient analysis. Our primary goal is to carry out statistical in- 
ference on temporal gradients with data arising from a temporal process 
indexed discretely over space. We will do so using the notions of smooth- 
ness of a Gaussian process and its derivative. Adler (2009), Mardia et al. 
(1996) and Banerjee and Gelfand (2003) discuss derivatives (more gener- 
ally, linear functionals) of Gaussian processes, while Banerjee, Gelfand and 
Sirmans (2003) lay out an inferential framework for directional gradients 
on a spatial surface. Most of the existing work on derivatives of stochastic 
processes deal either with purely temporal or purely spatial processes [see, 
e.g., Banerjee (2010)]. Here, we consider gradients for a temporal process 
indexed discretely over space. 

Assume that {Zi{t) :t G SR"*"} is a stationary random process for each re- 
gion i} The process is L2 (or mean square) continuous at to if hmi_>.i(j E\Zi{t) - 
■^i(^o)P = 0. The notion of a mean square differentiable process can be for- 
malized using the analogous definition of total differentiability of a function 
in a nonstochastic setting [see, e.g., Banerjee and Gelfand (2003)]: Zi{t) is 
mean square differentiable at to if it admits a first order linear expansion for 
any scalar /i, 

(6) Ziito + h) = Ziito) + hZ^it) + o{h) 

in the L2 sense as /i — > 0, where we say that -^Zi{t) = Z^[to) is the gradient 
or derivative process derived from the parent process Zi{t). In other words, 
we require 

(60 lim^f^l(^°±4^^iM_zKto)V = 0. 

Equations (6) and (6') ensure that mean square differentiable processes are 
mean square continuous. 

For a univariate stationary process, smoothness in the mean square sense 
is determined by its covariance or correlation function. A stationary multi- 
variate process Z(i) with matrix- valued covariance function Kz{^) will ad- 
mit a well-defined gradient process Z'(t) = {Z[{t), . . . , Z^ (i))'^ if and only if 
iC^(O) exists, where K'^[Q) is the element-wise second-derivative of Kz{^) 
evaluated at A = 0. 



^ Stationarity is not required. We only use it to ensure smoothness of realizations and 
to simplify forms for the induced covariance function. 
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A Gaussian process with a Matern correlation function has sample paths 
that are \(j)2 — 1] times differentiable. As (^2 — ^ oo, the Matern correlation 
function converges to the squared exponential (or the so-called Gaussian) 
correlation function, which is infinitely differentiable and leads to acute over- 
smoothing. When (j)2 = 0.5, the Matern correlation function is identical to 
the exponential correlation function [see, e.g.. Stein (1999)]. To ensure that 
the underlying process is differentiable so that the gradient process exists, 
we need to restrict (j)2 > 1- However, letting 02 > 2 usually leads to over- 
smoothing, as the data can rarely distinguish among values of the smooth- 
ness parameter greater than 2. Hence, we restrict 02 £ (l, 2]. We could either 
assign a prior on this support or simply fix (p2 somewhere in this interval. 
Since it is difficult to elicit informative priors for the smoothness parameter, 
we would most likely end up with a uniform prior. In our experience, not 
only does this deliver only modest posterior learning and lead to an increase 
in computing (both in terms of MCMC convergence and estimating the re- 
sulting correlation function and its derivative) , but the substantive inference 
is almost indistinguishable from what is obtained by fixing 02. 

As such, in our subsequent analysis we fix 02 = 3/2, which has the side 
benefit of yielding the closed form expression p(A;0i) = (1 + 0i|A|) x 
exp(— 0i|A|). The first and second order derivatives for the matrix- valued 
covariance function in (3) can now be obtained explicitly as 

K'ziA) = -CTV?Aexp(-0i|A|)(Z) - aW)'^ and 
(7) 

Turning to inference for gradients, we seek the joint posterior predictive 
distribution, 



p(Z'(to)|Y)= / p(Z'(io)|Y,Z,0)p(Z|0,Y)p(0|Y)d0dZ 
(8) 

= j p{z'{tQ)\z,e)p{z\e,Y)p{e\Y)dedz, 

where the second equality follows from the fact that the gradient process is 
derived entirely from the parent process and so p(Z'(io)|Y, Z,0) does not 
depend on Y. 

We evaluate (8) using composition sampling. Here, we first obtain 6^ ' ,0^ ' , 
...,0(A^) ~p(0|Y) and Z^^) ~ p(Z|0(^\ Y),i = 1,2, . . . ,M, where M is the 
number of (post-burn-in) posterior samples. Next, for each j we draw Zi^^' ~ 
p{Z\e^^\Y), and finally Z'{toY^'^ r^ p{Z'{to)\Z^^\e'-^^). The conditional dis- 
tribution for the gradient can be seen to be multivariate normal with mean 
and variance-covariance matrix given by 

;i^,|^,, = cov(Z'(io),Z)var(Z)~^Z = -(/s:^)'^S^^Z and 
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where S^^ = ^i?(0)~^ (E>{D- aW) and {K'^)'^ is an Ns x NgNt block matrix 
whose jth block is given by the Ng x Ng matrix K'^{Aqj), with Aqj = tj — to- 
Note that Tiz'\z,e is an NgNt x NgNt matrix, but we can use the properties 
of the MRF to only invert Nf x Nt matrices. 

6. Simulation studies. To validate our model's ability to correctly esti- 
mate both our model parameters and the underlying temporal gradients, we 
have constructed two separate simulation studies using the Ns = 58 counties 
of California as our spatial grid and Nt = 50 observations per county, where 
T = {1, 2, . . . , 50}. Each simulation study consists of 100 data sets comprised 
of 2900 observations generated from (1), where fii{t) = Xj(t)-^/3, using the 
same parameter values, and our results are based on 5000 MCMC samples 
after a burn-in period of 5000 iterations. 

In an effort to obtain simulated outcomes comparable to those from our 
real data, our first simulation study uses an intercept and the four covariates 
described in Section 2, and we set the 5x1 vector, /3, as the least squares 
estimates from our real data. We also set (/> = 1, a = 0.90, and a'^ = 18, which 
are then used to generate true values for Z, while our rf are drawn from an 
inverse Gamma distribution centered at 1 with modest variance. For each of 
the 100 simulated data sets, we constructed 95% Bayesian credible intervals 
for each parameter and recorded the number of times they included their 
true values (i.e., their "frequentist coverage"). We found this coverage to be 
between 93-97% for the 5 /3's, about 87% for the random effect variance <t^ 
and around 90% on the average for the 58 rf^s, with the majority of them 
having 95% coverage. Coverage was poor for r? < 0.15; in situations where 
small variances are to be expected, this issue could be avoided or alleviated 
by rescaling the data or specifying a prior with a larger mass near 0, re- 
spectively. The spatiotemporal random effects, Z, also enjoyed satisfactory 
coverage; the average coverage over the 2900 space-time random effects was 
around 95.5%. By contrast, the coverage for the propriety parameter, a, and 
the spatial range parameter, (j), reveal biases, with coverages less than 50%. 
This is not entirely unexpected, as spatial and temporal range parameters of 
this type are known to be weakly identified by the data [e.g., Zhang (2004)]. 
Furthermore, the biases for (p and a are not substantial, with their posterior 
medians only 8% above and 5% below their true values, respectively. In an 
effort to verify the robustness of our model to these biases, we repeated the 
simulation with both (p and a fixed at their true values and were able to 
reproduce our results. 

Having demonstrated the ability of our model to correctly estimate model 
parameters, the focus of our second simulation study is to validate the theory 
of our temporal gradient processes. To do this, we assumed 

(9) Yi{tj)'^N(5 + xa*sm(^-A + x,2*cos(^-A ,Tf 
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Fig. 2. Spatiotemporal random effects and temporal gradients for a region based on one 
data set from the second simulation study. Solid black lines denote true sinusoidal curves 
based on the model m equation (9), while gray bands represent 95% credible intervals. 



where xn is the ith county's percent black and Xj2 is the ith county's ozone 
level from April 1991, as described in Section 2; this was done in order to 
induce spatial clustering. As there was no evidence of an association between 
the coverage of the random effects, Z, and the region-specific variance pa- 
rameters, values of Tf were generated from a Uniform(0.5,2.0) distribution 
in order to avoid the extreme values of the inverse Gamma and focus our 
attention on the random effects themselves. After generating 100 data sets 
based on these parameters, we then modeled the data using only an inter- 
cept, leaving the spatiotemporal random effects to capture the sinusoidal 
curve, and conducted the gradient analysis at the midpoints of each time 
interval. Figure 2 displays the true spatiotemporal random effects and tem- 
poral gradients for a particular region, along with their 95% CI estimated 
from one of the 100 data sets. As can be seen, our Gaussian process model 
accurately estimates both the random effects and the temporal gradients. 
Across all 100 data sets, 98.3% of the the theoretical gradients derived using 
elementary calculus were covered by their respective 95% CI, confirming the 
validity of the gradient theory derived in Section 5. 

7. Data analysis. As first mentioned in Section 2, our data set is com- 
prised of monthly asthma hospitalization rates in the counties of California 
over an 18-year period. As such, A'j = 12-18 = 216, and we will again use 
tj = J = 1, 2, . . . , Nt- The covariates in this model include population den- 
sity, ozone level, the percent of the county under 18 and percent black. 
Population-based covariates are calculated for each county using the 2000 
U.S. Census, thus, they do not vary temporally. However, the covariate for 
ozone level is aggregated at the air basin level and varies monthly, though 
show little variation annually. In order to accommodate seasonality in the 
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Table 1 

Comparisons between our areally referenced Gaussian process model and the three 

alternatives, pn is a measure of model complexity, as it represents the effective number 

of parameters. Smaller values of DIG and Dawid-Sebastiam (D~S) scores indicate a 

better trade-off between in-sample model fit and model complexity 

PD DIG* D S* 



Simple linear regression 


79 


9894 


16,166 


Random intercept and slope 


165 


4347 


10,403 


CAR model 


117 


7302 


13,436 


Areally referenced Gaussian process 


5256 









* 
Both Die and D-S shown are standardized relative to our areally referenced Gaussian 

Process model. 

data, monthly fixed effects are included, using January as a baseline. Thus, 
after accounting for the monthly fixed effects and the four covariates of 
interest, Xj(i) is a 16 x 1 vector. 

To justify the use of the model we've described, we compare it to three 
alternative models using the DIG criterion [Spiegelhalter et al. (2002)] and a 
predictive model choice criterion using strictly proper scoring rules proposed 
by Gneiting and Raftery [(2007) equation (27)]. Following Gzado, Gneiting 
and Held (2009), we refer to this as the Dawid-Sebastiani (D-S) score [Dawid 
and Sebastiani (1999)]. These models are all still of the form 

Y,{t) = ^i{typ + Zi{t) + Siit), ei{t) ~ iV(0, rf) 
(10) 

fori = l,2,...,iV„ 

but with different Zi(t). Our first model is a simple linear regression model 
which ignores both the spatial and the temporal autocorrelation, that is, 
Zi{t) = yi,t. The second model allows for a random intercept and random 
temporal slope, but ignores the spatial nature of the data, that is, here 

Zi{t) = aoi + aiit, where aki ~' A^(0,cj|), for fc = 0, 1. In this model, to 
preserve model identifiability, we must remove the global intercept from our 
design matrix, Xj(t). Our third model builds upon the second, but introduces 
spatial autocorrelation by letting a^ = (a^i, . . . , akNsY ~ GAR((T^), A; = 0, 1. 
The results of the model comparison can be seen in Table 1, which indicates 
that our Gaussian process model has the lowest DIG value and D-S score, 
and is thus the preferred model and the only one we consider henceforth. The 
surprisingly large po for the areally referenced Gaussian process model arises 
due to the very large size of the data set (58 counties x 216 time points). 

The estimates for our model parameters can be seen in Table 2. The coef- 
ficients for the monthly covariates indicate decreased hospitalization rates in 
the summer months, a trend which is consistent with previous findings. The 
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Table 2 
Parameter estimates for asthma hospitalization data, where estimates for r^ re-present 



the median (95% CI) of the r^ ,i' 



,Ns=58 



Parameter 



Median (95% CI) 



Parameter 



Median (95% CI) 



Po (Intercept) 


9.17 


8.93, 9.42) 


Pio (July) 


-3.78 


(-4.21, -3.37) 


l3i (Pop Den) 


0.60 


0.49, 0.70) 


Pii (August) 


-3.58 


(-4.02, -3.13) 


P2 (Ozone) 


-0.18 


-0.28, -0.08) 


P12 (September) 


-1.96 


(-2.37, -1.54) 


03 (% Black) 


1.24 


1.15, 1.34) 


Pi3 (October) 


-1.36 


(-1.73, -1.00) 


Pi (% Under 18) 


1.12 


1.01, 1.24) 


Pi4 (November) 


-0.71 


(-1.02, -0.42) 


Ps (February) 


-0.25 


-0.46, -0.04) 


/3i5 (December) 


0.63 


(0.41, 0.86) 


Pe (March) 


-0.21 


-0.48, 0.07) 


^ 


0.90 


(0.84, 0.97) 


pr (April) 


-1.47 


-1.81, -1.12) 


OL 


0.77 


(0.71, 0.80) 


Ps (May) 


-1.17 


-1.53, -0.8) 


a^ 


21.52 


(20.18, 23.06) 


p9 (June) 


-2.79 


-3.21, -2.4) 


f.2 


3.32 


(0.18, 213.16) 



coefficients for population density, percent under 18 and percent black are all 
significantly positive, also as expected. The coefficient for ozone level is sig- 
nificantly negative, however, which is surprising but consistent with the pat- 
terns in the monthly trends for both hospitalization rates and ozone levels. 
This result may also be confounded by the absence of other climate-related 
factors and the sensitivity of asthma admissions to acute weather effects. 

There is a large range of values for the county-specific residual variance 
parameters, rf . Perhaps not surprisingly, the magnitude of these terms seems 
to be negatively correlated with the population of the given counties, demon- 
strating the effect a (relatively) small denominator can have when computing 
and modeling rates. The strong spatial story seen in the maps is reflected 
by the size of o"^ compared to the majority of the rf . There is also relatively 
strong temporal correlation, with (f) = 0.9 corresponding to p{ti,tj;<j)) > 0.4 
for \tj — ti\ less than 2 months. 

Maps of the yearly (averaged across month) spatiotemporal random ef- 
fects can be seen in Figure 3. Since here we are dealing with the residual 
curve after accounting for a number of mostly nontime- varying covariates, it 
comes as no surprise that the spatiotemporal random effects capture most 
of the variability in the model, including the striking decrease in yearly hos- 
pitalization rates over the study period. It also appears that our model is 
providing a better fit to the data in the years surrounding 2000, perhaps 
indicating that we could improve our fit by allowing our demographic co- 
variates to vary temporally. Our model also appears to be performing well in 
the central counties, where asthma hospitalization rates remained relatively 
stable for much of the study period. 

In the top panel of Figure 4, we compare the monthly temporal pro- 
files of the random effects for Los Angeles and San Francisco Counties. For 
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Fig. 3. Spatial random effects for asthma hospitalization data, by year. 

Los Angeles County, the spatiotemporal random effects (top-left panel) de- 
crease at a consistent, moderate rate throughout the length of the study 
with several large spikes prior to 2000. In contrast, San Francisco County's 
random effects (top-right) have fewer and less dramatic spikes. In addition, 
San Francisco County appears to have had a changepoint in its spatiotem- 
poral random effects around 2000, where they transition from a fairly steady 
decline to a period of lower variability and very little mean change. Further 
investigation may reveal a corresponding change in social, environmental or 
health care reimbursement policy. The bottom-left panel shows the temporal 
trend of the gradients in Los Angeles County, which reveal the large degree 
of variability in the random effects. In fact, as more clearly shown in the 
bottom-right panel of Figure 4, the September to October gradient was sig- 
nificantly positive five times between 1995 and 2001, and three times during 
this period (1995, 1997 and 1999) the November to December gradients were 
significantly positive, but were immediately followed by significantly nega- 
tive gradients from December to January, a pattern that is seen throughout 
the region. 

A strength of using a continuous-time model for these data is that it seam- 
lessly permits prediction at a finer resolution than that of the observed data. 
Upon seeing the significant gradients in Los Angeles County in November 
and December of 1995, public health officials may ask for a more detailed 
report than a monthly aggregation can provide. If a discrete-time model 
were used, researchers would be required to refit the model, pre-specifying 
at which unobserved time points to conduct inference; however, with this 
model, we can use the posterior predictive distribution to interpolate values 
at any time. As a demonstration of this, Figure 5 displays the predicted 
daily values (solid line) and 95% CI bands (dashed lines) every 3 days dur- 
ing the period November 15, 1995 to January 15, 1996, plotted against the 
true observed rates (open circles). Despite substantial noise in the data and 



TEMPORAL GRADIENTS IN REGIONALLY AGGREGATED DATA 



17 



spatlot&nnsnoral Rjindom Eir^t 
Los An3^«s Counlv 



SfMitiolefripora] Random Effect 
S&Ji Francisco Coun^ 




T'empDralGradieri 
Los Angeles County 



Temporal Gradien Ls betviwcn 1 995 and 2D0 1 
Los An ge^les County 



1 Uhiiri ijii 1 :L .yjij L. 


2 jiAXkkui^i.i-'iJi'^ 








Fig. 4. Comparison between the spatiotemporal random effects in Los Angeles and San 
Francisco Counties, and an investigation of temporal gradients m Los Angeles County. 
Point estimates in black and corresponding 95% CI bands in gray. Figures in the top 
panel illustrate the differences in the temporal trends of the random effects between the two 
counties. The bottom-left figure displays the temporal gradients computed between months 
in Los Angeles County, and the bottom-right figure displays the subset of the gradients 
which are further described in the text. 

modeling based solely on the aggregate rates for each month (and assigning 
that value to the temporal midpoint of each month), our predictions and 
95% CI bands perform reasonably well. 

As our data are aggregated monthly, we felt it was also important to in- 
vestigate the gradients on a month-to-month basis over the course of the 
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Fig. 5. Posterior predicted curves (and 95% credible bounds) for the daily asthma hos- 
pitalization rates in Los Angeles County between November 15, 1995 to January 15, 1996. 
This county and interval was selected due the presence of a significantly positive gradient 
between November and December and a significantly negative gradient between December 
and January. The true hospitalizations are also shown for comparison purposes, though 
the model was fit using only the monthly aggregates. 
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Fig. 6. Temporal gradients for transition from August to September over time. 



study. For instance, Figure 6 reveals the gradients between August and 
September decrease substantially statewide over the course of the study. 
Coupling this with the information in Table 2, which indicates that hos- 
pitalization rates in September are /3i2 — /3ii = 1.62 per 100,000 higher 
than those in August, suggests that the difference in asthma hospitaliza- 
tion rates between August and September has decreased nearly 60%, going 
from roughly 2.31 at the beginning of the period to just 0.97 by the end. 
An investigation of the raw hospitalization rates shows a similar trend, but 
this is to be expected since most of the spatiotemporal variability in the 
model is accounted for by the random effects. A similar, though not as 
striking, phenomenon occurs between March and April, where the gradients 
are increasing. As these two pairs of months lie on the transition between 
the warmer months and the cooler months, this result would seem to sug- 
gest that the effect of seasonality has moderated over the length of the 
study. 

One limitation of this analysis is that the data records asthma hospital- 
izations, not overall prevalence. This is an important distinction, as factors 
that trigger symptoms of asthma may not be the same as or have the same 
impact on asthma hospitalizations. For instance, residents of regions with 
high risk environments may be better educated about and/or prepared for 
managing their symptoms, which could lead to a relative decrease in asthma 
hospitalization rates. Another limitation is that, due to the aggregation of 
our data, we have an inconvenient interpretation of the daily estimates in 
Figure 5. A more accurate interpretation of these values is that they are the 
average daily rates for the one-month interval centered at a particular day. 
More generally, the interpretation of predicted values at any time point is 
determined by the aggregation of the data, but this is certainly not unique 
to this model. 
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8. Summary and conclusions. In this paper we have provided an overview 
of parent and gradient processes, building on previous work in spatiotempo- 
ral Gaussian process modehng. We then described our modehng framework 
and methodology that allows for inference on temporal gradients. An imple- 
mentation of this work was outlined in Section 4, and its theory was verified 
via simulation. Its use was then illustrated on a real data set in Section 7, 
where our results showed real insight can be gained from an assessment 
of temporal gradients in the residual Gaussian process, indicating overall 
trends as well as motivating a search for temporally interesting covariates 
still missing from our model (say, one that changes abruptly in San Francisco 
County around 2000). 

We believe there are two primary points of discussion regarding this work, 
the first of which is the use of modeling time as continuous. If inference is 
desired at the resolution of the data only, then several of the discrete-time 
models in the literature would be appropriate; in Appendix D of the online 
supplement to this article, we compare our methods to one such model. Of- 
tentimes, however, this is not the case, as investigators and administrators 
may seek estimates of the temporal effects on a finer scale. In our exam- 
ple, public health officials may be interested in the daily effects of asthma, 
which can be correlated with effects of daily variation of temperature and 
a variety of atmospheric pollutants. A practical issue here is that hospital- 
ization data are often more cleanly available as monthly aggregates (say, 
due to patient confidentiality issues, like those described in Section 2) and, 
even when the daily data are available, they tend to be both massive and 
very likely to have many missing values. Analyzing such data using discrete- 
time models would require methods for handling temporal misalignment, 
while our temporal process-based methods can handle such inference in a 
posterior predictive fashion. Furthermore, treating time as continuous per- 
mits inference on temporal gradients, which we feel can be an important 
tool for better understanding complex space-time data sets. In some sense, 
our modeling framework can be looked upon as generalizing the work of 
Vivar and Ferreira (2009) with a stochastic temporal process and deriving 
a tractable inferential framework for infinitesimal rates of change for that 
process. 

A second important point of discussion is the importance of significance 
with respect to these temporal gradients. We believe it depends on the prob- 
lem being modeled. While we have accounted for monthly differences in our 
design matrix, the Zi{t) here may simply be capturing the remaining cyclical 
trend, and this is why we felt it was more beneficial to focus on a side-by- 
side comparison of two of California's most populous counties, which moti- 
vated a further investigation of Los Angeles County, and the trends of the 
twelve month-to-month comparisons rather than solely on whether a spe- 
cific gradient for a particular county was significant. In situations where it's 
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reasonable to assume two time points are comparable, investigating signif- 
icant temporal gradients can indicate periods of important changes in the 
data, which may be caused by rapid changes in missing covariates. We also 
point out that the methodology for gradients outlined here can be applied 
to more general spatial functional data analysis contexts and will be espe- 
cially useful for estimating gradients from high-resolution samples of the 
function. 

Regarding the specific application of this methodology in this paper, it 
bears mentioning that modeling our data as rates is not the only option. 
Often, the counts themselves are modeled directly using a log-linear model, 
with a Poisson distributional assumption justified as a rare-events approxi- 
mation to the binomial. In this setting, however, we would no longer be able 
to rely on the closed form Gibbs Sampler for updating our random effects, 
instead requiring Metropolis updates and a substantial increase in compu- 
tational burden. Another option is to use a Freeman-Tukey transformation 
of the rates and a single error variance parameter, r^, which is scaled by 
the county's population, as shown in Freeman and Tukey (1950) and Cressie 
and Chan (1989), with the goal of justifying the assumption of normality. 
Given the population sizes we're dealing with, we believe the assumption of 
normality of our observed rates can be justified as a normal approximation 
to the binomial. Furthermore, an analysis of the transformed data results 
in nearly identical substantive findings. However, there is a drawback: by 
modeling transformed values instead of the rates themselves, we lose the 
interpretability of the scale for not only our regression parameters, but also 
the temporal gradients. In our experience, a common question among public 
health practitioners is, "What does this mean?" As such, we feel that having 
results which are straightforward to interpret is of the utmost importance 
and, thus, we chose to model the untransformed rates. Incidentally, we also 
considered modeling the untransformed rates using a model with a single er- 
ror variance parameter (scaled by population). Sadly, the simplicity of this 
model failed to outweigh its loss of flexibility and, in any case, this model 
would not be generalizable to nonrate data. 

One weakness of our model that we plan to address in the future is that, 
if the true underlying process is less smooth in some regions than others, 
or if there are spatial outliers, our model may simultaneously both over- 
smooth and undersmooth the random effects, Z. In our gradient simulation 
in Section 6, the counties of Alameda (home of Oakland) and Solano have 
significantly larger percentages of African Americans than any other county 
in the state. As a result, the true underlying process that we've constructed 
using (9) for these counties takes much more extreme values than their neigh- 
bors, resulting in oversmoothing in these counties and creating the potential 
for undersmoothing in other counties. While this issue is not unique to our 
model, this can lead to poor estimation of the temporal gradients, such as 
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biased estimates or wide credible intervals. An approach similar to the spa- 
tially adaptive CAR (SACAR) model proposed by Reich and Hodges (2008) 
offers one possible solution: replace the covariance matrix, S^, in (3) with 

(11) ^z = R{(t})^T{D-aW)-^T, 

where T is a diagonal matrix with Ta = ai. We believe by allowing each 
region to have its own variance parameter, outliers such as Alameda and 
Solano in our simulation will receive larger ai (relative to the single variance 
parameter, o", described in this paper) and, thus, will be less constrained 
by the magnitude of their neighbors. Furthermore, regions which are more 
similar to their neighbors would conceivably receive smaller ctj, allowing 
for tighter credible intervals for both the random effects and their gradi- 
ents. 

We certainly have not exhausted our modeling options from a theoreti- 
cal standpoint, either. Some of the richer association structures described 
in Appendix B of the online supplement may be appropriate in alternate 
inferential contexts. While we demonstrated the advantages of the process- 
based specifications over some simpler parametric options for Zi{t) in our 
data analysis, one could envision alternative specifications depending upon 
the inferential question at hand. For example, if interest lay in separating 
the variability between time and space using two variance parameters, ad- 
ditive specifications such as Zi{t) = Ui + w{t), where Uj's follow a Markov 
random field and w{t) is a temporal Gaussian process, could be explored. 
Now the Uj's and w{t)'s could have their own variance components. This, 
however, would not allow the temporal functions to borrow strength across 
the neighbors as effectively as we do here. 

Apart from exploring such alternate specifications, our future work in- 
cludes expanding our focus to include spatiotemporal gradients for point- 
referenced (geostatistical) data, where our response arises from a spatiotem- 
poral process Y{s;t) with s S K"^. Typically, we have a finite collection of 
sites S = {si , . . . , s„} and time points t G T = {ti , . . . , t^t } (as before) where 
the responses Y{si;tj) have been observed. Spatiotemporal gradient analy- 
sis in this setting offers richer possibilities, and of course avoids the prob- 
lems associated with the CAR model's failure to offer a true spatial process 
[Banerjee, Carlin and Gelfand (2004), pages 82-83]. Here one can concep- 
tualize spatial (directional) gradients, temporal gradients or even "mixed" 
gradients. 
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SUPPLEMENTARY MATERIAL 

Imputation of missing daily hospitalization counts, MCMC details, alter- 
native models and comparison with discrete-time models 

(DOI: 10.1214/12-AOAS600SUPP; .pdf). As data for days with between 
one and four asthma hospitalizations are missing, we impute county-specific 
values for these days using a method similar to Besag's iterated conditional 
modes method [Besag (1986)] but with means. We also lay out the details for 
the MCMC implementation, discuss more general versions of our model and 
compare our gradient estimates to finite differences from a simple discrete- 
time model. 
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